-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-1398] Enable zero-copy from numpy to MXNet NDArray #14733
Conversation
5e1a175
to
38383e7
Compare
@reminisce @wkcn Could you help review? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can call MXNDArrayFromDLPack rather than MXNDArrayFromDLManagedTensor.
@reminisce Hey I have already addressed your comments. Could you review again? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! LGTM. Thanks for your contribution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the awesome work!
Merged. Thank you! |
|
||
if not ndarray.flags['C_CONTIGUOUS']: | ||
raise ValueError("Only c-contiguous arrays are supported for zero-copy") | ||
c_obj = _make_dl_managed_tensor(ndarray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when doing zero-copy, should the ownership of the data pointer be transferred to the new ndarray? do we need to update the writable flag? what happens when the original data is updated by numpy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szha They share the ownership - which means numpy's modification is transparent to mxnet and vice versa. As requested by @reminisce in his review, I added a testcase for this transparency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does it work when using the asynchronous engine? since numpy calls happen immediately, allowing shared ownership may give unexpected results. consider the following case:
a = np.array(...)
b = nd.from_numpy(a, zero_copy=True)
c = nd.expensive_op(b, in_place=True)
d = np.inplace_op(a) # this is called when c hasn't finished yet
given the asynchronous execution, it's hard to tell whether people should expect d to be based on the input before or after expensive_op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(We actually have such discussion long time before, but it's good to make things open and achievable to the community)
My understanding is that your concern is a general question in multi-threading (see the code below). In any multi-threaded system, we may expect this issue to exist -- and MXNet has multi-threaded engine, so this issue exists, right?
We have discussion that how about the content of array is changed somewhere inside the system but user don't know that, should we set up the WRITEABLE
flag to be true to prevent users from writing, or should we completely disallow users to access this numpy array?
The question, in the high-level, is all about trade-off. Restricting the freedom of users may make things less error-prone, while giving users full freedom may make the system more powerful.
On one hand, in my humble opinion, restricting the freedom of users, like setting up WRITEABLE
flag, does not completely resolve this issue, because users can still encounter pitfalls when they read the array using the numpy side API. The only helps in this case is to completely disable users from read or write the array, which grant them on access only the numpy side - This is more like guaranteeing safety by disallowing users to create threads/processes.
On the other hand, giving users freedom enables them to do more things, and does not necessarily mean that they will definitely make a mistake, because we will encourage to use mxnet's numpy compatibility, which is on the way.
Anyway, I am open to any discussion and open to possible changes setting up several flags on the numpy array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API should have clear semantics regardless of users’ knowledge on how MXNet execution works. If you worry about flexibility I’d recommend making it an option to not disable writable flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we add a zero-copy option to asnumpy where the ownership is completely transferred back to numpy, I think there wouldn't be any need for co-ownership.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the following case:
# the variable `a` is a `mx.nd.NDArray` object
a = mx.nd.array([1,2,3])
b = a.asnumpy(zero_copy=True)
# users call `asnumpy(zero_copy=True)` twice
c = a.asnumpy(zero_copy=True)
c[:] = 3
print(a)
print(b)
a[:] = 5
a.wait_to_read()
print(b)
print(c)
It's necessary to keep co-ownership.
We can add extra option in from_numpy
and asnumpy
to prompt users to call the function wait_to_read()
.
Alternatively, can we add a hook into numpy.ndarray
? When users access numpy.ndarray
, wait_to_read()
will be called automatically.
Reference: https://docs.scipy.org/doc/numpy/reference/arrays.classes.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If zero-copy is on, then a
should not be usable after the asnumpy
call. The point is to let the framework deal with all synchronization, and co-ownership doesn't allow that. And for the same case, you can simply copy c
from b
(the new numpy array) instead of a
(the old ndarray), so it's not necessary to keep co-ownership.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hook in the context seems to refer to our own subclass of numpy array. Creating a new subclass seems like overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Enable zero-copy from numpy to MXNet NDArray * should work * make lint happy * fix stupid typos * wip to address comments * Address comments * Address comments * Remove redundant code
* Enable zero-copy from numpy to MXNet NDArray * should work * make lint happy * fix stupid typos * wip to address comments * Address comments * Address comments * Remove redundant code
Description
This PR allows users to convert numpy's NDArray without copying content data. This is achieved by converting numpy's NDArray to
DLManagedTensor
, which is introduced in dmlc/dlpack#38. We expect that this would help users reduce major performance bottleneck when this conversion is conducted frequently.See also: #14244.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
NDArray::FromDLPack
to make it resource safe.Comments
Nothing.